Overview

Brought to you by YData

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells423432
Missing cells (%)7.9%8.1%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Sex is highly overall correlated with SurvivedSex is highly overall correlated with SurvivedHigh correlation
Survived is highly overall correlated with SexSurvived is highly overall correlated with SexHigh correlation
Age has 81 (18.2%) missing values Age has 86 (19.3%) missing values Missing
Cabin has 340 (76.2%) missing values Cabin has 345 (77.4%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 299 (67.0%) zeros SibSp has 305 (68.4%) zeros Zeros
Parch has 345 (77.4%) zeros Parch has 345 (77.4%) zeros Zeros
Fare has 8 (1.8%) zeros Fare has 10 (2.2%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2025-03-18 20:27:21.7968622025-03-18 20:27:23.908053
Analysis finished2025-03-18 20:27:23.9053122025-03-18 20:27:25.990366
Duration2.11 seconds2.08 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean444.0426446.19283
 Dataset ADataset B
Minimum23
Maximum891891
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-18T20:27:26.086736image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum23
5-th percentile48.559.5
Q1231.5235.25
median428.5446.5
Q3653.75649.75
95-th percentile853.75845.5
Maximum891891
Range889888
Interquartile range (IQR)422.25414.5

Descriptive statistics

 Dataset ADataset B
Standard deviation253.55235247.19408
Coefficient of variation (CV)0.571009070.55400731
Kurtosis-1.1706937-1.0945158
Mean444.0426446.19283
Median Absolute Deviation (MAD)210209
Skewness0.0362046440.034319144
Sum198043199002
Variance64288.79661104.916
MonotonicityNot monotonicNot monotonic
2025-03-18T20:27:26.225266image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
350 1
 
0.2%
357 1
 
0.2%
755 1
 
0.2%
159 1
 
0.2%
406 1
 
0.2%
118 1
 
0.2%
545 1
 
0.2%
764 1
 
0.2%
533 1
 
0.2%
247 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
90 1
 
0.2%
572 1
 
0.2%
103 1
 
0.2%
707 1
 
0.2%
205 1
 
0.2%
134 1
 
0.2%
198 1
 
0.2%
104 1
 
0.2%
676 1
 
0.2%
653 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
2 1
0.2%
4 1
0.2%
5 1
0.2%
8 1
0.2%
13 1
0.2%
16 1
0.2%
18 1
0.2%
19 1
0.2%
21 1
0.2%
22 1
0.2%
ValueCountFrequency (%)
3 1
0.2%
5 1
0.2%
6 1
0.2%
12 1
0.2%
13 1
0.2%
18 1
0.2%
20 1
0.2%
23 1
0.2%
24 1
0.2%
29 1
0.2%
ValueCountFrequency (%)
3 1
0.2%
5 1
0.2%
6 1
0.2%
12 1
0.2%
13 1
0.2%
18 1
0.2%
20 1
0.2%
23 1
0.2%
24 1
0.2%
29 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
4 1
0.2%
5 1
0.2%
8 1
0.2%
13 1
0.2%
16 1
0.2%
18 1
0.2%
19 1
0.2%
21 1
0.2%
22 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
272 
1
174 
0
288 
1
158 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row11
2nd row10
3rd row01
4th row01
5th row01

Common Values

ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 288
64.6%
1 158
35.4%

Length

2025-03-18T20:27:26.321791image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-18T20:27:26.366616image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:26.399342image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 288
64.6%
1 158
35.4%

Most occurring characters

ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 288
64.6%
1 158
35.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 288
64.6%
1 158
35.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 288
64.6%
1 158
35.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 288
64.6%
1 158
35.4%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
231 
1
112 
2
103 
3
237 
1
111 
2
98 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row11
2nd row21
3rd row32
4th row23
5th row22

Common Values

ValueCountFrequency (%)
3 231
51.8%
1 112
25.1%
2 103
23.1%
ValueCountFrequency (%)
3 237
53.1%
1 111
24.9%
2 98
22.0%

Length

2025-03-18T20:27:26.450985image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-18T20:27:26.497117image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:26.538254image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
3 231
51.8%
1 112
25.1%
2 103
23.1%
ValueCountFrequency (%)
3 237
53.1%
1 111
24.9%
2 98
22.0%

Most occurring characters

ValueCountFrequency (%)
3 231
51.8%
1 112
25.1%
2 103
23.1%
ValueCountFrequency (%)
3 237
53.1%
1 111
24.9%
2 98
22.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 231
51.8%
1 112
25.1%
2 103
23.1%
ValueCountFrequency (%)
3 237
53.1%
1 111
24.9%
2 98
22.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 231
51.8%
1 112
25.1%
2 103
23.1%
ValueCountFrequency (%)
3 237
53.1%
1 111
24.9%
2 98
22.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 231
51.8%
1 112
25.1%
2 103
23.1%
ValueCountFrequency (%)
3 237
53.1%
1 111
24.9%
2 98
22.0%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-18T20:27:26.876157image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length8267
Median length5250
Mean length27.55829626.878924
Min length1313

Characters and Unicode

 Dataset ADataset B
Total characters1229111988
Distinct characters5959
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowBowerman, Miss. Elsie EdithAppleton, Mrs. Edward Dale (Charlotte Lamson)
2nd rowHerman, Mrs. Samuel (Jane Laver)White, Mr. Richard Frasar
3rd rowSmiljanic, Mr. MileKelly, Mrs. Florence "Fannie"
4th rowGale, Mr. ShadrachCohen, Mr. Gurshon "Gus"
5th rowTurpin, Mr. William John RobertWeisz, Mrs. Leopold (Mathilde Francoise Pede)
ValueCountFrequency (%)
mr 256
 
13.9%
miss 92
 
5.0%
mrs 68
 
3.7%
william 45
 
2.4%
john 21
 
1.1%
master 20
 
1.1%
henry 19
 
1.0%
mary 13
 
0.7%
thomas 12
 
0.6%
george 11
 
0.6%
Other values (898) 1291
69.9%
ValueCountFrequency (%)
mr 263
 
14.5%
miss 93
 
5.1%
mrs 66
 
3.7%
william 25
 
1.4%
john 23
 
1.3%
henry 21
 
1.2%
master 15
 
0.8%
richard 13
 
0.7%
george 13
 
0.7%
james 13
 
0.7%
Other values (907) 1263
69.9%
2025-03-18T20:27:27.377409image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1403
 
11.4%
r 1003
 
8.2%
e 867
 
7.1%
a 863
 
7.0%
i 703
 
5.7%
s 658
 
5.4%
n 647
 
5.3%
l 589
 
4.8%
M 560
 
4.6%
o 484
 
3.9%
Other values (49) 4514
36.7%
ValueCountFrequency (%)
1362
 
11.4%
r 993
 
8.3%
e 855
 
7.1%
a 825
 
6.9%
n 680
 
5.7%
i 657
 
5.5%
s 654
 
5.5%
M 569
 
4.7%
l 520
 
4.3%
o 492
 
4.1%
Other values (49) 4381
36.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 12291
100.0%
ValueCountFrequency (%)
(unknown) 11988
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1403
 
11.4%
r 1003
 
8.2%
e 867
 
7.1%
a 863
 
7.0%
i 703
 
5.7%
s 658
 
5.4%
n 647
 
5.3%
l 589
 
4.8%
M 560
 
4.6%
o 484
 
3.9%
Other values (49) 4514
36.7%
ValueCountFrequency (%)
1362
 
11.4%
r 993
 
8.3%
e 855
 
7.1%
a 825
 
6.9%
n 680
 
5.7%
i 657
 
5.5%
s 654
 
5.5%
M 569
 
4.7%
l 520
 
4.3%
o 492
 
4.1%
Other values (49) 4381
36.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 12291
100.0%
ValueCountFrequency (%)
(unknown) 11988
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1403
 
11.4%
r 1003
 
8.2%
e 867
 
7.1%
a 863
 
7.0%
i 703
 
5.7%
s 658
 
5.4%
n 647
 
5.3%
l 589
 
4.8%
M 560
 
4.6%
o 484
 
3.9%
Other values (49) 4514
36.7%
ValueCountFrequency (%)
1362
 
11.4%
r 993
 
8.3%
e 855
 
7.1%
a 825
 
6.9%
n 680
 
5.7%
i 657
 
5.5%
s 654
 
5.5%
M 569
 
4.7%
l 520
 
4.3%
o 492
 
4.1%
Other values (49) 4381
36.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 12291
100.0%
ValueCountFrequency (%)
(unknown) 11988
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1403
 
11.4%
r 1003
 
8.2%
e 867
 
7.1%
a 863
 
7.0%
i 703
 
5.7%
s 658
 
5.4%
n 647
 
5.3%
l 589
 
4.8%
M 560
 
4.6%
o 484
 
3.9%
Other values (49) 4514
36.7%
ValueCountFrequency (%)
1362
 
11.4%
r 993
 
8.3%
e 855
 
7.1%
a 825
 
6.9%
n 680
 
5.7%
i 657
 
5.5%
s 654
 
5.5%
M 569
 
4.7%
l 520
 
4.3%
o 492
 
4.1%
Other values (49) 4381
36.5%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
285 
female
161 
male
285 
female
161 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.72197314.7219731
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters21062106
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowfemalefemale
2nd rowfemalemale
3rd rowmalefemale
4th rowmalemale
5th rowmalefemale

Common Values

ValueCountFrequency (%)
male 285
63.9%
female 161
36.1%
ValueCountFrequency (%)
male 285
63.9%
female 161
36.1%

Length

2025-03-18T20:27:27.466991image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-18T20:27:27.645317image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:27.676857image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
male 285
63.9%
female 161
36.1%
ValueCountFrequency (%)
male 285
63.9%
female 161
36.1%

Most occurring characters

ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2106
100.0%
ValueCountFrequency (%)
(unknown) 2106
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2106
100.0%
ValueCountFrequency (%)
(unknown) 2106
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2106
100.0%
ValueCountFrequency (%)
(unknown) 2106
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7477
Distinct (%)20.3%21.4%
Missing8186
Missing (%)18.2%19.3%
Infinite00
Infinite (%)0.0%0.0%
Mean29.60068530.413639
 Dataset ADataset B
Minimum0.750.75
Maximum8080
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-18T20:27:27.770909image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.750.75
5-th percentile56
Q12121
median28.529
Q33839
95-th percentile5657
Maximum8080
Range79.2579.25
Interquartile range (IQR)1718

Descriptive statistics

 Dataset ADataset B
Standard deviation14.16606914.326524
Coefficient of variation (CV)0.478572330.4710559
Kurtosis0.358993210.38199735
Mean29.60068530.413639
Median Absolute Deviation (MAD)8.59
Skewness0.430412280.44526272
Sum10804.2510948.91
Variance200.67751205.24929
MonotonicityNot monotonicNot monotonic
2025-03-18T20:27:27.911882image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
22 16
 
3.6%
24 15
 
3.4%
36 13
 
2.9%
29 13
 
2.9%
28 13
 
2.9%
30 12
 
2.7%
25 12
 
2.7%
18 12
 
2.7%
32 11
 
2.5%
35 11
 
2.5%
Other values (64) 237
53.1%
(Missing) 81
 
18.2%
ValueCountFrequency (%)
22 15
 
3.4%
18 14
 
3.1%
21 13
 
2.9%
28 13
 
2.9%
25 13
 
2.9%
30 13
 
2.9%
24 13
 
2.9%
32 12
 
2.7%
19 11
 
2.5%
26 11
 
2.5%
Other values (67) 232
52.0%
(Missing) 86
 
19.3%
ValueCountFrequency (%)
0.75 1
 
0.2%
1 4
0.9%
2 4
0.9%
3 4
0.9%
4 4
0.9%
5 4
0.9%
6 2
0.4%
7 2
0.4%
8 1
 
0.2%
9 3
0.7%
ValueCountFrequency (%)
0.75 1
 
0.2%
0.83 2
 
0.4%
1 2
 
0.4%
2 3
0.7%
3 1
 
0.2%
4 6
1.3%
5 2
 
0.4%
6 2
 
0.4%
7 2
 
0.4%
8 2
 
0.4%
ValueCountFrequency (%)
0.75 1
 
0.2%
0.83 2
 
0.4%
1 2
 
0.4%
2 3
0.7%
3 1
 
0.2%
4 6
1.3%
5 2
 
0.4%
6 2
 
0.4%
7 2
 
0.4%
8 2
 
0.4%
ValueCountFrequency (%)
0.75 1
 
0.2%
1 4
0.9%
2 4
0.9%
3 4
0.9%
4 4
0.9%
5 4
0.9%
6 2
0.4%
7 2
0.4%
8 1
 
0.2%
9 3
0.7%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.533632290.50224215
 Dataset ADataset B
Minimum00
Maximum88
Zeros299305
Zeros (%)67.0%68.4%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-18T20:27:28.004248image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile22
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.12878131.0380381
Coefficient of variation (CV)2.11527922.066808
Kurtosis18.82519919.569565
Mean0.533632290.50224215
Median Absolute Deviation (MAD)00
Skewness3.842973.7477421
Sum238224
Variance1.27414721.0775231
MonotonicityNot monotonicNot monotonic
2025-03-18T20:27:28.069021image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 299
67.0%
1 112
 
25.1%
2 15
 
3.4%
4 6
 
1.3%
5 5
 
1.1%
3 5
 
1.1%
8 4
 
0.9%
ValueCountFrequency (%)
0 305
68.4%
1 103
 
23.1%
2 18
 
4.0%
3 8
 
1.8%
4 8
 
1.8%
8 3
 
0.7%
5 1
 
0.2%
ValueCountFrequency (%)
0 299
67.0%
1 112
 
25.1%
2 15
 
3.4%
3 5
 
1.1%
4 6
 
1.3%
5 5
 
1.1%
8 4
 
0.9%
ValueCountFrequency (%)
0 305
68.4%
1 103
 
23.1%
2 18
 
4.0%
3 8
 
1.8%
4 8
 
1.8%
5 1
 
0.2%
8 3
 
0.7%
ValueCountFrequency (%)
0 305
68.4%
1 103
 
23.1%
2 18
 
4.0%
3 8
 
1.8%
4 8
 
1.8%
5 1
 
0.2%
8 3
 
0.7%
ValueCountFrequency (%)
0 299
67.0%
1 112
 
25.1%
2 15
 
3.4%
3 5
 
1.1%
4 6
 
1.3%
5 5
 
1.1%
8 4
 
0.9%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct67
Distinct (%)1.3%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.376681610.367713
 Dataset ADataset B
Minimum00
Maximum56
Zeros345345
Zeros (%)77.4%77.4%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-18T20:27:28.130620image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q300
95-th percentile22
Maximum56
Range56
Interquartile range (IQR)00

Descriptive statistics

 Dataset ADataset B
Standard deviation0.832914860.81515845
Coefficient of variation (CV)2.21119062.2168333
Kurtosis9.844257111.54414
Mean0.376681610.367713
Median Absolute Deviation (MAD)00
Skewness2.85683042.9823302
Sum168164
Variance0.693747170.6644833
MonotonicityNot monotonicNot monotonic
2025-03-18T20:27:28.195373image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 345
77.4%
1 54
 
12.1%
2 38
 
8.5%
5 4
 
0.9%
4 3
 
0.7%
3 2
 
0.4%
ValueCountFrequency (%)
0 345
77.4%
1 56
 
12.6%
2 37
 
8.3%
4 3
 
0.7%
5 2
 
0.4%
3 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 345
77.4%
1 54
 
12.1%
2 38
 
8.5%
3 2
 
0.4%
4 3
 
0.7%
5 4
 
0.9%
ValueCountFrequency (%)
0 345
77.4%
1 56
 
12.6%
2 37
 
8.3%
3 2
 
0.4%
4 3
 
0.7%
5 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 345
77.4%
1 56
 
12.6%
2 37
 
8.3%
3 2
 
0.4%
4 3
 
0.7%
5 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 345
77.4%
1 54
 
12.1%
2 38
 
8.5%
3 2
 
0.4%
4 3
 
0.7%
5 4
 
0.9%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct383389
Distinct (%)85.9%87.2%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-18T20:27:28.610458image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.73318396.7421525
Min length33

Characters and Unicode

 Dataset ADataset B
Total characters30033007
Distinct characters3232
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique335342 ?
Unique (%)75.1%76.7%

Sample

 Dataset ADataset B
1st row11350511769
2nd row22084535281
3rd row315037223596
4th row28664A/5 3540
5th row11668228414
ValueCountFrequency (%)
pc 32
 
5.6%
c.a 14
 
2.5%
ca 10
 
1.8%
a/5 8
 
1.4%
2 7
 
1.2%
ston/o 7
 
1.2%
2144 5
 
0.9%
sc/paris 5
 
0.9%
s.o.c 4
 
0.7%
soton/oq 4
 
0.7%
Other values (400) 471
83.1%
ValueCountFrequency (%)
pc 27
 
4.8%
c.a 12
 
2.1%
a/5 11
 
2.0%
ston/o 6
 
1.1%
2 6
 
1.1%
soton/o.q 6
 
1.1%
ca 6
 
1.1%
w./c 6
 
1.1%
347088 4
 
0.7%
347082 4
 
0.7%
Other values (408) 472
84.3%
2025-03-18T20:27:29.133304image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 363
12.1%
1 341
11.4%
2 302
10.1%
7 249
8.3%
4 228
 
7.6%
6 222
 
7.4%
5 195
 
6.5%
0 191
 
6.4%
9 167
 
5.6%
8 144
 
4.8%
Other values (22) 601
20.0%
ValueCountFrequency (%)
3 393
13.1%
1 337
11.2%
2 300
10.0%
7 250
8.3%
4 228
 
7.6%
0 205
 
6.8%
6 195
 
6.5%
5 194
 
6.5%
9 162
 
5.4%
8 154
 
5.1%
Other values (22) 589
19.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3003
100.0%
ValueCountFrequency (%)
(unknown) 3007
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 363
12.1%
1 341
11.4%
2 302
10.1%
7 249
8.3%
4 228
 
7.6%
6 222
 
7.4%
5 195
 
6.5%
0 191
 
6.4%
9 167
 
5.6%
8 144
 
4.8%
Other values (22) 601
20.0%
ValueCountFrequency (%)
3 393
13.1%
1 337
11.2%
2 300
10.0%
7 250
8.3%
4 228
 
7.6%
0 205
 
6.8%
6 195
 
6.5%
5 194
 
6.5%
9 162
 
5.4%
8 154
 
5.1%
Other values (22) 589
19.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3003
100.0%
ValueCountFrequency (%)
(unknown) 3007
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 363
12.1%
1 341
11.4%
2 302
10.1%
7 249
8.3%
4 228
 
7.6%
6 222
 
7.4%
5 195
 
6.5%
0 191
 
6.4%
9 167
 
5.6%
8 144
 
4.8%
Other values (22) 601
20.0%
ValueCountFrequency (%)
3 393
13.1%
1 337
11.2%
2 300
10.0%
7 250
8.3%
4 228
 
7.6%
0 205
 
6.8%
6 195
 
6.5%
5 194
 
6.5%
9 162
 
5.4%
8 154
 
5.1%
Other values (22) 589
19.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3003
100.0%
ValueCountFrequency (%)
(unknown) 3007
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 363
12.1%
1 341
11.4%
2 302
10.1%
7 249
8.3%
4 228
 
7.6%
6 222
 
7.4%
5 195
 
6.5%
0 191
 
6.4%
9 167
 
5.6%
8 144
 
4.8%
Other values (22) 601
20.0%
ValueCountFrequency (%)
3 393
13.1%
1 337
11.2%
2 300
10.0%
7 250
8.3%
4 228
 
7.6%
0 205
 
6.8%
6 195
 
6.5%
5 194
 
6.5%
9 162
 
5.4%
8 154
 
5.1%
Other values (22) 589
19.6%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct176185
Distinct (%)39.5%41.5%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean33.52916733.342104
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros810
Zeros (%)1.8%2.2%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-18T20:27:29.254791image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.2257.0719
Q18.057.925
median15.245814.47915
Q330.530.5
95-th percentile118.31875130.2375
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)22.4522.575

Descriptive statistics

 Dataset ADataset B
Standard deviation56.20345353.993906
Coefficient of variation (CV)1.67625561.6193911
Kurtosis36.9076831.015335
Mean33.52916733.342104
Median Absolute Deviation (MAD)7.72297.22915
Skewness5.29571884.7355989
Sum14954.00814870.579
Variance3158.82822915.3419
MonotonicityNot monotonicNot monotonic
2025-03-18T20:27:29.392903image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13 29
 
6.5%
8.05 22
 
4.9%
26 19
 
4.3%
7.8958 16
 
3.6%
7.75 12
 
2.7%
10.5 12
 
2.7%
7.775 9
 
2.0%
7.2292 9
 
2.0%
7.925 9
 
2.0%
26.55 9
 
2.0%
Other values (166) 300
67.3%
ValueCountFrequency (%)
13 26
 
5.8%
7.8958 19
 
4.3%
8.05 18
 
4.0%
26 15
 
3.4%
7.75 15
 
3.4%
0 10
 
2.2%
10.5 9
 
2.0%
8.6625 8
 
1.8%
26.55 8
 
1.8%
7.925 8
 
1.8%
Other values (175) 310
69.5%
ValueCountFrequency (%)
0 8
1.8%
5 1
 
0.2%
6.2375 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.975 1
 
0.2%
7.05 4
0.9%
7.0542 2
 
0.4%
7.125 2
 
0.4%
ValueCountFrequency (%)
0 10
2.2%
4.0125 1
 
0.2%
6.4375 1
 
0.2%
6.4958 2
 
0.4%
6.8583 1
 
0.2%
6.95 1
 
0.2%
7.0458 1
 
0.2%
7.05 5
1.1%
7.0542 1
 
0.2%
7.125 3
 
0.7%
ValueCountFrequency (%)
0 10
2.2%
4.0125 1
 
0.2%
6.4375 1
 
0.2%
6.4958 2
 
0.4%
6.8583 1
 
0.2%
6.95 1
 
0.2%
7.0458 1
 
0.2%
7.05 5
1.1%
7.0542 1
 
0.2%
7.125 3
 
0.7%
ValueCountFrequency (%)
0 8
1.8%
5 1
 
0.2%
6.2375 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.975 1
 
0.2%
7.05 4
0.9%
7.0542 2
 
0.4%
7.125 2
 
0.4%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct8589
Distinct (%)80.2%88.1%
Missing340345
Missing (%)76.2%77.4%
Memory size7.0 KiB7.0 KiB
2025-03-18T20:27:29.762336image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1515
Median length33
Mean length3.68867923.6336634
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters391367
Distinct characters1919
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique6877 ?
Unique (%)64.2%76.2%

Sample

 Dataset ADataset B
1st rowE33C101
2nd rowC86D26
3rd rowB96 B98E8
4th rowE101B22
5th rowF33B39
ValueCountFrequency (%)
b96 4
 
3.1%
b98 4
 
3.1%
e101 3
 
2.4%
c23 3
 
2.4%
c25 3
 
2.4%
c27 3
 
2.4%
f 3
 
2.4%
f33 2
 
1.6%
e8 2
 
1.6%
b18 2
 
1.6%
Other values (85) 98
77.2%
ValueCountFrequency (%)
e67 2
 
1.7%
d20 2
 
1.7%
c23 2
 
1.7%
c25 2
 
1.7%
c27 2
 
1.7%
e101 2
 
1.7%
f4 2
 
1.7%
d35 2
 
1.7%
e44 2
 
1.7%
c2 2
 
1.7%
Other values (89) 98
83.1%
2025-03-18T20:27:30.199739image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 38
 
9.7%
B 37
 
9.5%
C 35
 
9.0%
2 35
 
9.0%
3 29
 
7.4%
5 28
 
7.2%
6 24
 
6.1%
8 21
 
5.4%
21
 
5.4%
E 19
 
4.9%
Other values (9) 104
26.6%
ValueCountFrequency (%)
C 37
 
10.1%
1 37
 
10.1%
3 35
 
9.5%
2 34
 
9.3%
B 29
 
7.9%
6 23
 
6.3%
7 19
 
5.2%
0 19
 
5.2%
E 18
 
4.9%
4 18
 
4.9%
Other values (9) 98
26.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 391
100.0%
ValueCountFrequency (%)
(unknown) 367
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 38
 
9.7%
B 37
 
9.5%
C 35
 
9.0%
2 35
 
9.0%
3 29
 
7.4%
5 28
 
7.2%
6 24
 
6.1%
8 21
 
5.4%
21
 
5.4%
E 19
 
4.9%
Other values (9) 104
26.6%
ValueCountFrequency (%)
C 37
 
10.1%
1 37
 
10.1%
3 35
 
9.5%
2 34
 
9.3%
B 29
 
7.9%
6 23
 
6.3%
7 19
 
5.2%
0 19
 
5.2%
E 18
 
4.9%
4 18
 
4.9%
Other values (9) 98
26.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 391
100.0%
ValueCountFrequency (%)
(unknown) 367
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 38
 
9.7%
B 37
 
9.5%
C 35
 
9.0%
2 35
 
9.0%
3 29
 
7.4%
5 28
 
7.2%
6 24
 
6.1%
8 21
 
5.4%
21
 
5.4%
E 19
 
4.9%
Other values (9) 104
26.6%
ValueCountFrequency (%)
C 37
 
10.1%
1 37
 
10.1%
3 35
 
9.5%
2 34
 
9.3%
B 29
 
7.9%
6 23
 
6.3%
7 19
 
5.2%
0 19
 
5.2%
E 18
 
4.9%
4 18
 
4.9%
Other values (9) 98
26.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 391
100.0%
ValueCountFrequency (%)
(unknown) 367
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 38
 
9.7%
B 37
 
9.5%
C 35
 
9.0%
2 35
 
9.0%
3 29
 
7.4%
5 28
 
7.2%
6 24
 
6.1%
8 21
 
5.4%
21
 
5.4%
E 19
 
4.9%
Other values (9) 104
26.6%
ValueCountFrequency (%)
C 37
 
10.1%
1 37
 
10.1%
3 35
 
9.5%
2 34
 
9.3%
B 29
 
7.9%
6 23
 
6.3%
7 19
 
5.2%
0 19
 
5.2%
E 18
 
4.9%
4 18
 
4.9%
Other values (9) 98
26.7%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing21
Missing (%)0.4%0.2%
Memory size7.0 KiB7.0 KiB
S
333 
C
79 
Q
 
32
S
330 
C
76 
Q
39 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters444445
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSS
2nd rowSS
3rd rowSS
4th rowSS
5th rowSS

Common Values

ValueCountFrequency (%)
S 333
74.7%
C 79
 
17.7%
Q 32
 
7.2%
(Missing) 2
 
0.4%
ValueCountFrequency (%)
S 330
74.0%
C 76
 
17.0%
Q 39
 
8.7%
(Missing) 1
 
0.2%

Length

2025-03-18T20:27:30.278585image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-18T20:27:30.326310image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:30.363824image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
s 333
75.0%
c 79
 
17.8%
q 32
 
7.2%
ValueCountFrequency (%)
s 330
74.2%
c 76
 
17.1%
q 39
 
8.8%

Most occurring characters

ValueCountFrequency (%)
S 333
75.0%
C 79
 
17.8%
Q 32
 
7.2%
ValueCountFrequency (%)
S 330
74.2%
C 76
 
17.1%
Q 39
 
8.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 444
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 333
75.0%
C 79
 
17.8%
Q 32
 
7.2%
ValueCountFrequency (%)
S 330
74.2%
C 76
 
17.1%
Q 39
 
8.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 444
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 333
75.0%
C 79
 
17.8%
Q 32
 
7.2%
ValueCountFrequency (%)
S 330
74.2%
C 76
 
17.1%
Q 39
 
8.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 444
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 333
75.0%
C 79
 
17.8%
Q 32
 
7.2%
ValueCountFrequency (%)
S 330
74.2%
C 76
 
17.1%
Q 39
 
8.8%

Interactions

Dataset A

2025-03-18T20:27:23.365411image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:25.457196image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T20:27:22.048628image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:24.122931image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T20:27:22.342783image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:24.412834image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T20:27:22.658445image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:24.722119image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T20:27:22.975716image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:25.037958image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T20:27:23.421035image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:25.514715image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T20:27:22.106915image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:24.176587image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T20:27:22.404476image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:24.471962image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T20:27:22.720736image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:24.784078image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T20:27:23.129462image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:25.095046image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T20:27:23.483759image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:25.574868image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T20:27:22.168549image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:24.236695image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T20:27:22.470937image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:24.538620image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T20:27:22.783074image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:24.844589image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T20:27:23.191045image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:25.159466image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T20:27:23.545338image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:25.637548image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T20:27:22.230393image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:24.297932image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T20:27:22.532976image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:24.598166image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T20:27:22.850875image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:24.911973image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T20:27:23.253320image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:25.222996image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T20:27:23.603032image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:25.695758image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T20:27:22.286145image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:24.355179image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T20:27:22.595287image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:24.661902image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T20:27:22.911516image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:24.975144image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T20:27:23.308034image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:25.399107image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Correlations

Dataset A

2025-03-18T20:27:30.413229image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T20:27:30.650262image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.0830.156-0.2760.0010.2610.000-0.1690.229
Embarked0.0831.0000.2100.0310.0000.2420.1430.0000.200
Fare0.1560.2101.0000.390-0.0280.4490.2310.4260.298
Parch-0.2760.0310.3901.0000.0380.0000.2480.4390.200
PassengerId0.0010.000-0.0280.0381.0000.1140.049-0.0900.150
Pclass0.2610.2420.4490.0000.1141.0000.1700.1080.345
Sex0.0000.1430.2310.2480.0490.1701.0000.1850.531
SibSp-0.1690.0000.4260.439-0.0900.1080.1851.0000.141
Survived0.2290.2000.2980.2000.1500.3450.5310.1411.000

Dataset B

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.0690.104-0.2840.0910.2140.019-0.1670.091
Embarked0.0691.0000.1840.0000.0000.2590.0240.0140.095
Fare0.1040.1841.0000.3910.0200.4780.1910.4370.273
Parch-0.2840.0000.3911.0000.0400.0000.2930.4570.172
PassengerId0.0910.0000.0200.0401.0000.0320.000-0.0620.045
Pclass0.2140.2590.4780.0000.0321.0000.1500.1040.362
Sex0.0190.0240.1910.2930.0000.1501.0000.2550.569
SibSp-0.1670.0140.4370.457-0.0620.1040.2551.0000.112
Survived0.0910.0950.2730.1720.0450.3620.5690.1121.000

Missing values

Dataset A

2025-03-18T20:27:23.697628image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2025-03-18T20:27:25.789472image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2025-03-18T20:27:23.777472image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2025-03-18T20:27:25.870466image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2025-03-18T20:27:23.861964image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2025-03-18T20:27:25.949912image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
35635711Bowerman, Miss. Elsie Edithfemale22.00111350555.0000E33S
75475512Herman, Mrs. Samuel (Jane Laver)female48.01222084565.0000NaNS
15815903Smiljanic, Mr. MilemaleNaN003150378.6625NaNS
40540602Gale, Mr. Shadrachmale34.0102866421.0000NaNS
11711802Turpin, Mr. William John Robertmale29.0101166821.0000NaNS
54454501Douglas, Mr. Walter Donaldmale50.010PC 17761106.4250C86C
76376411Carter, Mrs. William Ernest (Lucile Polk)female36.012113760120.0000B96 B98S
53253303Elias, Mr. Joseph Jrmale17.01126907.2292NaNC
24624703Lindahl, Miss. Agda Thorilda Viktoriafemale25.0003470717.7750NaNS
78478503Ali, Mr. Williammale25.000SOTON/O.Q. 31013127.0500NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
57157211Appleton, Mrs. Edward Dale (Charlotte Lamson)female53.0201176951.4792C101S
10210301White, Mr. Richard Frasarmale21.0013528177.2875D26S
70670712Kelly, Mrs. Florence "Fannie"female45.00022359613.5000NaNS
20420513Cohen, Mr. Gurshon "Gus"male18.000A/5 35408.0500NaNS
13313412Weisz, Mrs. Leopold (Mathilde Francoise Pede)female29.01022841426.0000NaNS
19719803Olsen, Mr. Karl Siegwart Andreasmale42.00145798.4042NaNS
10310403Johansson, Mr. Gustaf Joelmale33.00075408.6542NaNS
67567603Edvardsson, Mr. Gustaf Hjalmarmale18.0003499127.7750NaNS
65265303Kalvik, Mr. Johannes Halvorsenmale21.00084758.4333NaNS
43143213Thorneycroft, Mrs. Percival (Florence Kate White)femaleNaN1037656416.1000NaNS

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
28428501Smith, Mr. Richard WilliammaleNaN0011305626.0000A19S
21821911Bazzani, Miss. Albinafemale32.0001181376.2917D15C
29029111Barber, Miss. Ellen "Nellie"female26.0001987778.8500NaNS
40340403Hakkarainen, Mr. Pekka Pietarimale28.010STON/O2. 310127915.8500NaNS
62362403Hansen, Mr. Henry Damsgaardmale21.0003500297.8542NaNS
19519611Lurette, Miss. Elisefemale58.000PC 17569146.5208B80C
71271311Taylor, Mr. Elmer Zebleymale48.0101999652.0000C126S
66466513Lindqvist, Mr. Eino Williammale20.010STON/O 2. 31012857.9250NaNS
42342403Danbom, Mrs. Ernst Gilbert (Anna Sigrid Maria Brogren)female28.01134708014.4000NaNS
34935003Dimic, Mr. Jovanmale42.0003150888.6625NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
83383403Augustsson, Mr. Albertmale23.0003474687.8542NaNS
73273302Knight, Mr. Robert JmaleNaN002398550.0000NaNS
43243312Louch, Mrs. Charles Alexander (Alice Adelaide Slow)female42.010SC/AH 308526.0000NaNS
32332412Caldwell, Mrs. Albert Francis (Sylvia Mae Harbaugh)female22.01124873829.0000NaNS
21621713Honkanen, Miss. Eliinafemale27.000STON/O2. 31012837.9250NaNS
77277302Mack, Mrs. (Mary)female57.000S.O./P.P. 310.5000E77S
29029111Barber, Miss. Ellen "Nellie"female26.0001987778.8500NaNS
82882913McCormack, Mr. Thomas JosephmaleNaN003672287.7500NaNQ
70270303Barbara, Miss. Saiidefemale18.001269114.4542NaNC
899003Celotti, Mr. Francescomale24.0003432758.0500NaNS

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.